Anthropogenic cimate change is a global issue that has and will continue to cause irreversible damage to a significant proportion of the global population (Masson-Delmotte et al., 2021; Pörtner et al., 2022). We need to monitor how humans are influencing the climate such that we can adjust our behaviour to mitigate the worst effects of a changing climate. Greenhouse gas emissions and indicators of climate are important as they can measure our direct influence on the climate system (Fawzy et al., 2020). In this report I will explore an emissions variable and an indicator variable, showing how open-source tools and data can be used by anyone to make a simple assessment of a changing climate.
When looking at the causes of anthropogenic climate change we often look at carbon dioxide (CO\(_2\)) emissions. It’s difficult to measure at the source, but luckily we have long-term time series of CO\(_2\) so we can explore the relationship it has on climate conditions. But how do we measure climate conditions? A key player in the climate system is the ocean, which has a huge impact of heat distribution and transmission globally. Heat distribution leads to weather, the extremes of which are a good barometer of climate change. So the temperature of the ocean, particularly the surface which leads to winds and localised weather, can be a good (but not complete!) measure of climate conditions. This measure is called Sea Surface Temperature (SST) and is a common feature of climate models.
I want to look at the relationship between SST and CO\(_2\), as well as looking at how these variables vary over space and time. This may give an indication of how anthropogenic climate change correlates to more extreme weather, using CO\(_2\) concentration and SST as our representatives.
I will be exploring the following questions:
I will explore these by manipulating and visualising the data in various formats. I’ll start with SST, then move to CO\(_2\), then combine the data and explore them together. All of my code will be presented so you can complete, critique, and build upon the analyses yourself!
If you have downloaded the markdown file and want to run this yourself, make sure you download the data (placed in a folder named ‘data’ in the directory of the markdown folder). You will also need to install all of the packages I use (see Setting Up). Note that a couple of the visualisations take a while to render, especially the SST animation which also takes up a ~150mb of storage (although you can delete the frames as soon as you have rendered the video).
As I mentioned before, we have long-term time series of CO\(_2\) data. I will use the Mauna Loa open dataset, which goes back to the mid 1950s and measures CO\(_2\) concentration at a station in Hawaii. The data are maintained by the National Oceanic and Atmospheric Administration. You can find the dataset at the Global Monitoring Laboratory of NOAA, which is open and accessible with attribution (Keeling et al., 2001).
Measuring SST is difficult as it requires vessel expeditions. These days you can measure from space with satellites, but these time series only go back to when satellites were first launched in the ~1980s. If we want to go further back we need models. I will be using the The Extended Reconstructed Sea Surface Temperature (ERSST) model data. ERSST data are derived from a set of marine/ocean observations, then modelled alongside other climate variables. The data are monthly averages, ranging from 1854 to 2020. You can find the dataset on the NOAA gridded data site. This dataset is also open and accessible (Huang et al., 2017).
First let’s load up all the packages needed for the analyses. I will also set the working directory - if you have downloaded the zip file it should work automatically, setting the working directory to be the location of the folder.
# load packages to use throughout
library(this.path) # easy obtaining of source file dir/path
library(ggplot2) # plotting
library(ncdf4) # netcdf parsing
library(raster) # raster analysis
library(tidyverse) # data manipulation
library(lubridate) # date manipulation
library(tmap) # mapping
library(av) # video file creation
library(sf) # spatial feature manipulation
library(scales) # useful for setting labels/removing sci notation
library(latex2exp) # latex rendering for equations/labels
library(ggfortify) # plotting and exploring models
library(plotly) # interactive plotting
# set working directory to that of this rmd file
# this is assuming zip file with contents were not modified, such that the /data/ relative path will work
setwd(this.path::here())
# create directories for plots
# these will not crash, only warn if the directories already exist, so
# no need to check if folders exist already
dir.create('vis_frames/')
dir.create('plots/')
There are a few options to load up NetCDF data, which is what the SST data are saved as. First I will use the ncdf4 library to load it up and look at the metadata.
# load sea surface temperature data with ncdf4 library
sst <- nc_open('data/sst.mnmean.v3.nc')
sst # print data info
## File data/sst.mnmean.v3.nc (NC_FORMAT_NETCDF4_CLASSIC):
##
## 2 variables (excluding dimension variables):
## double time_bnds[nbnds,time]
## long_name: Time Boundaries
## float sst[lon,lat,time]
## long_name: Monthly Means of Sea Surface Temperature
## units: degC
## precision: 2
## least_significant_digit: 1
## var_desc: Sea Surface Temperature
## dataset: NOAA Extended Reconstructed SST V3b
## level_desc: Surface
## statistic: Mean
## parent_stat: Mean
## missing_value: -9.96920996838687e+36
## actual_range: -1.79999995231628
## actual_range: 33.9500007629395
## valid_range: -5
## valid_range: 40
##
## 4 dimensions:
## lon Size:180
## units: degrees_east
## long_name: Longitude
## actual_range: 0
## actual_range: 358
## standard_name: longitude
## axis: X
## coordinate_defines: center
## lat Size:89
## units: degrees_north
## long_name: Latitude
## actual_range: 88
## actual_range: -88
## standard_name: latitude
## axis: Y
## coordinate_defines: center
## nbnds Size:2
## [1] "vobjtovarid4: **** WARNING **** I was asked to get a varid for dimension named nbnds BUT this dimension HAS NO DIMVAR! Code will probably fail at this point"
## time Size:1994 *** is unlimited ***
## units: days since 1800-1-1 00:00:00
## long_name: Time
## delta_t: 0000-01-00 00:00:00
## avg_period: 0000-01-00 00:00:00
## prev_avg_period: 0000-00-07 00:00:00
## standard_name: time
## axis: T
## actual_range: 19723
## actual_range: 80384
##
## 12 global attributes:
## title: NOAA Extended Reconstructed SST V3b
## Conventions: CF-1.0
## history: Thu Jul 1 14:03:49 2010: ncatted -O -a _FillValue,sst,o,s,32767 sst.mnmean.v3b.nc
## created 09/2007 by CAS
## comments: The extended reconstructed sea surface temperature (ERSST)
## was constructed using the most recently available
## Comprehensive Ocean-Atmosphere Data Set (COADS) SST data
## and improved statistical methods that allow stable
## reconstruction using sparse data.
## Currently, ERSST version 2 (ERSST.v2) and version 3 (ERSST.v3) and ERSST.v3b) are available from NCDC.
## ERSST.v3b is an improved extended reconstruction over version 2.
## but with no satellite data
## platform: Model
## source: NOAA/NESDIS/National Climatic Data Center
## institution: NOAA/NESDIS/National Climatic Data Center
## citation: Smith, T.M., R.W. Reynolds, Thomas C. Peterson, and Jay Lawrimore 2007: Improvements to NOAA's Historical Merged Land-Ocean Surface Temperature Analysis (1880-2006). In press. Journal of Climate (ERSSTV3).
## Smith, T.M., and R.W. Reynolds, 2003: Extended Reconstruction of Global Sea Surface Temperatures Based on COADS Data (1854-1997). Journal of Climate, 16, 1495-1510. ERSSTV1
## Smith, T.M., and R.W. Reynolds, 2004: Improved Extended Reconstruction of SST (1854-1997). Journal of Climate, 17, 2466-2477.
## dataset_title: Extended Reconstructed Sea Surface Temperature (ERSST)
## source_doc: https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v3b
## data_modified: 2020-03-03
## References: https://www.psl.noaa.gov/data/gridded/data.noaa.ersst.html
We can see a lot of information here! We have 2 variables - time range (time_bnds) and SST. The SST contains latitude (lat) and longitude (lon) and time. Latitude and longitude are our y and x values respectively, so we can make maps. These are recorded at 2\(^\circ\) precision. Time is in the unit of days since 01/01/1800… we will probably have to convert this. Other than that we have lots of other information about the data under global attributes.
Next I will load up the data as a stack of spatial rasters so we can map everything.
# load sst data as raster stack
sst_rast <- stack('data/sst.mnmean.v3.nc')
sst_rast # print data info
## class : RasterStack
## dimensions : 89, 180, 16020, 1994 (nrow, ncol, ncell, nlayers)
## resolution : 2, 2 (x, y)
## extent : -1, 359, -89, 89 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## names : X1854.01.01, X1854.02.01, X1854.03.01, X1854.04.01, X1854.05.01, X1854.06.01, X1854.07.01, X1854.08.01, X1854.09.01, X1854.10.01, X1854.11.01, X1854.12.01, X1855.01.01, X1855.02.01, X1855.03.01, ...
We can see the raster stack has the same amount of layers as the size of time (1994), indicating that there is a raster for each point of time data. The names confirm this, with each layer having a date as the name (it looks like the raster library has parsed the date into a format we can read easily, so the conversion will be straightforward). The coordinate reference system is WGS84, which you can guess from the range of coordinate values when we first opened the data.
First let’s take a look at the underlying geometry to see how the raster is laid out.
# convert raster stack to points
# each point is an xy - a cell of the raster, along with the associated data
# that we found with the previous nc open (time, sst)
sst_df <- data.frame(rasterToPoints(sst_rast))
# plot geometry
sst_geom <- sst_df %>% select(c(x, y)) # extract geometry from df
# convert to simple feature. use WGS84 (epsg:4326) as that was the CRS listed in the raster stack
sst_geom_sf <- st_as_sf(sst_geom, coords=c('x', 'y'), crs=4326)
# note here that I am not going to save plots as variables except where necessary
# this code already saves a fair bit in memory so I want to reduce this where I
# can. the plots will show in markdown, and if you want to save them as image
# files you can modify the code
# make map of raster stack geometry
tm_shape(sst_geom_sf) +
tm_dots(size=0.001, col='red', title='Data point') + # point data
# set title, size, position. make north arrow and scale appear outside of the map
tm_layout(main.title='SST data points',
main.title.size=1,
main.title.position='center',
frame=FALSE,
attr.outside=TRUE) +
tm_scale_bar(position=c('left', 'top'), breaks=c(0, 2500, 5000, 7500, 10000)) + # add scale bar at set breaks (km)
tm_compass(size=1, position=c('right', 'top')) + # add north arrow
tm_grid() + # add x/y reference grid
tm_xlab('Longitude') + # set x label (longitude)
tm_ylab('Latitude') + # set y label (latitude)
tm_add_legend(type='symbol', label='Data point', col='red') + # add legend
tm_legend(position=c(0.6, -0.3)) # change legend position
Looks like the raster is a grid with no values on land - makes sense as the data are ocean-based. What about the underlying raster values?
I am going to convert the data into long format and add some more information that will be helpful when picking out trends, like climate zone (eg. antarctic, tropical, etc.).
# create long version of SST data
# this will have rows for each x, y, date and variable
# I will also classify the climate zone of each point based on lat/lon
sst_df_long <- sst_df %>%
# move SST to long, retain x and y, name by date
pivot_longer(cols=-c('x', 'y'), names_to='date', values_to='sst') %>%
# format dates. We will use different dates for plotting throughout
# I am using lubridate to change date formats, get month and year etc.
mutate(date_day=as_date(date, format='X%Y.%m.%d'), date_month=format(date_day, '%Y-%m')) %>%
mutate(year=year(date_day), month=month(date_day),
# define hemispheres
# assumption: that values of exactly 0 deg lat are in southern hemisphere
# and values of exactly 180 deg lon are eastern hempishere
lat_hem=case_when(y > 0 ~ 'northern', y <= 0 ~ 'southern'),
lon_hem=case_when(x < 180 ~ 'western', x >= 180 ~ 'eastern'),
# define tropical/subtropical climates
zone=case_when((y > 23 & y <= 66) ~ 'northern temperate zone',
(y > 66) ~ 'arctic',
(y <= 23 & y >= -23) ~ 'tropical',
(y < -23 & y >= -66) ~ 'southern temperate zone',
(y < -66) ~ 'antarctic')) %>%
select(-date) # remove original date
head(sst_df_long) # print the start of the long data to show structure
## # A tibble: 6 × 10
## x y sst date_day date_month year month lat_hem lon_hem zone
## <dbl> <dbl> <dbl> <date> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 0 88 -1.80 1854-01-01 1854-01 1854 1 northern western arctic
## 2 0 88 -1.80 1854-02-01 1854-02 1854 2 northern western arctic
## 3 0 88 -1.80 1854-03-01 1854-03 1854 3 northern western arctic
## 4 0 88 -1.80 1854-04-01 1854-04 1854 4 northern western arctic
## 5 0 88 -1.80 1854-05-01 1854-05 1854 5 northern western arctic
## 6 0 88 -1.80 1854-06-01 1854-06 1854 6 northern western arctic
range(sst_df_long$date_day) # print date range
## [1] "1854-01-01" "2020-02-01"
# plot distribution of values
ggplot(sst_df_long, aes(x=sst)) + # dist of sst
geom_histogram() + # histogram plot
theme_bw() + # set theme
# set labels
labs(x='SST (\u00B0C)', y='Count of data points', title='Distribution of all SST values') +
scale_y_continuous(labels=label_comma()) + # format number to have commas, not sci notation
scale_x_continuous() + # set x axis to continuous
theme(plot.title=element_text(hjust=0.5)) # centre plot title
The distribution is heavily skewed towards the lower and higher ends. The data ranges from the start of 1854 to the start of 2020. Now let’s try and visualise SST using the raster stack. We know the rasters are stacked by time, so I will take a couple of rasters based on this attribute to map SST at 2 time periods
# I have defined these breaks after looking at the distribution of values
# these should present the data well
sst_val_brks <- c(-5, 0, 5, 10, 15, 20, 25, 30, 35)
# function for producing sst maps
# I will do this a lot so making a function to not repeat code
# will not parameterize things like textNA, data title because the function
# is designed for a single data source that I know
get_sst_map <- function(data, title, palette='-Spectral', style='cont', # set default parameters
palette_mid=15,
scale_breaks=c(0, 2500, 5000, 7500, 10000),
colourblind=FALSE) {
# use colour-blind (deuterantopia, protanopia, tritanonia) friendly palette
# if the user inputs it. this is extremely important when mapping with a
# high range colour scheme. this could just be done with the palette parameter,
# but I think its nice to not make a colourblind person have to go through
# colour palettes themselves
if (colourblind) {
palette <- 'BrBG'
}
# create map
map <- tm_shape(data) + # shape around input data
# we know the input will be of raster type, so plot with raster
# use parameter values where it might change
tm_raster(palette=palette, midpoint=palette_mid, colorNA='gray', textNA='Land',
breaks=sst_val_brks, title='SST (\u00B0C)', style=style) +
# adjust map. set legend position, title and sizing
tm_layout(legend.outside.position=c('right'),
legend.outside.size=0.2,
legend.outside=TRUE,
legend.title.size=1,
main.title=title,
main.title.size=1,
panel.label.size=0.05,
legend.width=1) +
tm_scale_bar(position=c(0.01, 0), breaks=scale_breaks) +
tm_compass(size=0.8, text.size=0.8, position=c(0.93, 0.03)) # add scale bar
return(map) # return map object
}
# save map at 01/01/1854
sst_185401 <- get_sst_map(sst_rast$X1854.01.01, 'SST at 01/01/1854',
style='pretty') # use pretty (classified) style
# save map at 01/01/2019
sst_201901 <- get_sst_map(sst_rast$X2019.01.01, 'SST at 01/01/2019',
style='pretty') # use pretty (classified) style
# plot both maps side-by-side
tmap_arrange(sst_185401, sst_201901, ncol=1, asp=2)
The temperature looks to be increasing, but that is only 2 time points out of a total of 1994. I thought I would visualise all data by creating an animation of SST over time, with each frame in the animation being a plot at a given time point.
# I found that the animation functions (eg. gganimate) could not really handle
# 1994 unique visualisations. So I output each plot as a frame that I then
# convert to a video. This requires storage of the rendered frames (images)
# of around 250mb, but does mean that all the processing can happen in
# advance and so the visualisation itself is very smooth. The images take
# about 5 minutes to save and the video processing takes about 30 seconds.
# I have included the code to create the visualisation here, as well as the
# output in the folder in case you want to save time.
# If you want to stitch the images manually in ffmpeg, you can use:
# 'cat $(find . -maxdepth 1 -name "*.jpeg" | sort -V) | ffmpeg
# -framerate 15 -i - sst_vis.mp4'
sst_vis_fname <- 'plots/sst_animation.mp4' # set output file name
# check if the output file exists. if it does exist, begin processing
if (!file.exists(sst_vis_fname)) {
# for each raster in the raster stack
for (x in 1:dim(sst_rast)[3]) {
# create a frame identifier, in this case the order the raster is in the
# stack (which is sorted by date). add leading zeros for file sorting
frame_id <- str_pad(x, 4, pad = '0')
fname <- paste('vis_frames/', frame_id, '.jpeg', sep='') # create filename for frame
# parse the date of the current frame to use as plot title
# have to account for the leading 'X' in the raster name
map_date <- as_date(names(sst_rast[[x]])[[1]], format='X%Y.%m.%d')
# get map using function
# use continuous style to have a smoother representation of the data
# this is important when scrolling through frames in a video
cur_map <- get_sst_map(sst_rast[[x]], title=paste(map_date), style='cont')
# save map as file, with set size that works for aspect ratio of data
tmap_save(cur_map, fname, width=1500, height=800, dpi=200)
}
# obtain list of all frames (all jpeg files in output folder)
list_of_frames <- list.files('vis_frames', pattern='*.jpeg', full.names=T)
# encode video - stitch all frames together at a frame rate of 20 frames per second
av_encode_video(list_of_frames, framerate=20, output=sst_vis_fname)
}